(Cartoon by DonkeyHotey via Flickr.)


Introduction

In this EDA (Exploratory Data Analysis) project I will explore a dataset of the 2016 elections’ financial contributions, while examining its structure, variable, patterns and relationships between those variables.

My goal with this project is to find interesting insights that could lead to further investigations.
I will start with exploring few one-variable distributions, compare them against the ‘amount’ variable and try to find interesting patterns and relationships between those variables.

The data in this project was taken from the fcc.gov website. It includes all individual campaign contributions for the 2016 presidential elections and contributions from authorized committees.

The original dataset had 19 columns, which I reduced to 14 in order to limit the scope of this project.

!Important to note that the ‘finance’ dataset uploaded here was ‘munged’ in a different file called ‘all-munge.R’, which can be found in the root folder of this project on Github. You can read more about the structure of this project in the readme.MD file, which is located in the root folder as well.

The first variable to explore will be the ‘amount’ variable, which is the money a contributor donated to one or more of the candidates and is the only vector in the downloaded dataset that is not a character, rather a numeric vector.

First, lets take a look at the dataset, get familiar with the variables and ask questions about the data.

How many rows (individual contributions) and columns the dataset has?
## [1] 7299993      14
##  [1] "cand_id"     "candidate"   "contributor" "city"        "state"      
##  [6] "zipcode"     "employer"    "occupation"  "amount"      "date"       
## [11] "tran_id"     "election_tp" "party"       "gender"

We can see above that now the data set has 7,299,993 million observations of contributions contributed, with 14 different columns that correspond to each observation. The columns’ names meanings are:

“cand_id” - candidate ID
“candidate” - Candidate name
“contributor” - Contributor name
“city” - Contributor city
“state” - Contributor state
“zipcode” - Contributor zipcode
“employer” - Contributor employer
“occupation” - Contributor occupation
“amount” - Amount contributed
“date” - Contribution transaction date
“tran_id” - The contribution transaction ID
“election_tp” - Election type (General or Primaries)
“party” - The political party of the candidate
“gender” - The contributor’s gender

Example of a few lines from the dataset

Show basic statistics of the variables

##       cand_id          candidate                 contributor     
##  P00003392:3486114   Clinton:3486114   TRUITT, ROBERTA :   1520  
##  P60007168:2021278   Sanders:2021278   BODNICK, KATIE  :   1313  
##  P80001571: 742733   Trump  : 742733   AMISIAL, WILFRID:   1078  
##  P60006111: 539457   Cruz   : 539457   PURCELL, LARRY  :    722  
##  P60005915: 244027   Carson : 244027   SMITH, DAVID    :    686  
##  P60006723:  98528   Rubio  :  98528   WILLIAMS, JAMES :    679  
##  (Other)  : 167856   (Other): 167856   (Other)         :7293995  
##             city                   state            zipcode     
##  NEW YORK     : 204203   california   :1294446   Min.   :    0  
##  LOS ANGELES  : 102524   new york     : 640831   1st Qu.:20815  
##  SAN FRANCISCO:  90577   texas        : 539992   Median :52245  
##  WASHINGTON   :  90229   florida      : 420024   Mean   :52643  
##  BROOKLYN     :  87279   washington   : 293342   3rd Qu.:88030  
##  SEATTLE      :  83548   massachusetts: 279133   Max.   :99999  
##  (Other)      :6641633   (Other)      :3832225   NA's   :378    
##                   employer                       occupation     
##  N/A                  : 988025   RETIRED              :1630191  
##  RETIRED              : 902186   NOT EMPLOYED         : 618935  
##  SELF-EMPLOYED        : 531510   INFORMATION REQUESTED: 238300  
##  NONE                 : 447954   ATTORNEY             : 198752  
##  NOT EMPLOYED         : 262109   TEACHER              : 140600  
##  INFORMATION REQUESTED: 238602   PHYSICIAN            : 111102  
##  (Other)              :3929607   (Other)              :4362113  
##      amount             date                            tran_id       
##  Min.   :      0   Min.   :2013-10-01   A4EA7F7D9338943869B5:      8  
##  1st Qu.:     15   1st Qu.:2016-03-02   AA2F3125A0DB141928EB:      8  
##  Median :     28   Median :2016-05-28   AAC874DDA3EA04584A39:      8  
##  Mean   :    127   Mean   :2016-05-19   AB37264C070244DDDBF7:      8  
##  3rd Qu.:     93   3rd Qu.:2016-09-04   SA17A.4143          :      7  
##  Max.   :4904861   Max.   :2016-12-31   A1F4C793991D1416D939:      6  
##                                         (Other)             :7299948  
##  election_tp             party            gender       
##  G2016:2593021   Democrat   :5514536   female:3678175  
##  P2016:4706972   Green      :   8926   male  :3621818  
##                  Independent:   1275                   
##                  Republican :1775256                   
##                                                        
##                                                        
## 

Many different interesting points about the data can be seen in the above table and act as a ‘trailhead’ for investigative avenues. Let’s look at a few of them:
It seems that Hillary Clinton, under the ‘candidate’ column, had the highest number of occurrences, followed by Bernie Sanders and Donald Trump. Did she also lead with the total amount of contributions and not only the number of contributions?
Other things we can see in this first glance at the dataset with the number of distributions are:
New York is the leading city with 204,204 contributions; California is the leading state with the highest number of contributions (1,294,446); Retired people take the first and second places with a number of contributions under the ‘occupation’ and ‘employer’ variables; The Democratic party had about 4 times more contributions than the Republican party (5,556,219 / 1,786,731); The amounts donated to all parties started from few cents and reached 4,904,861, which was made by a single contributor. I wonder who that was.
I will focus on only few of the questions and variables above in the scope of this project and drill down where there is a need to understand better the distributions and connections between the variables.

One variable exploration

Amount

Let’s see first how much money was contributed in these elections by all contributors.

## [1] 928335424

The sum of all contributions to all candidates in 2016 elections was $932,698,768.

Plot the amount variable’s distribution


The first (left) plot seems to be a non-descriptive one, but in fact we can learn a few basic things about the ‘amount’ distribution from it. First, we can see the huge gap between the highest and lowest contributions, following the x axis amounts. Second, we can see from the first plot that most of the contributions were not very far from 0 and, for sure, not in the millions. Looking at the second plot above we can see that indeed most of the contributions were below $200, after dropping the top 3% of the contributions. The median contribution in this distribution is $28. I will split the amount donated to big and small donors on the $200 mark and check which candidates were suported by big and small contributors.

The amount distribution after taking the log10


With less outliers and variability, it is easier to look at the data and its distribution in what seems now like a normal distribution.

Candidate

Who were the candidates and which party they represented


We had 18 Republicans, 5 Democrats, 1 Independent and 1 Green, out of 25 candidates in 2016 elections. Republicans outnumbered the Democrats 3 times and 18 times the Green and Independent parties.
There are many questions that this party map of candidates brings up. First, why do the Republicans have so many more candidates than the other next big party?
Another obvious point is that there were mainly two parties competing in these elections, where the small ones seemed to have a very slim chance of winning. This is not just because of the minimal representation by candidates, it is also because the respectively small amounts that were collected by those parties compared to the two big ones, which will be demonstrated later on.
The American political system has been based on two-system-party since its inception, with the Federalists and the Democratic-Republican Parties, until today with the Democratic and Republican parties. An interesting question for further investigation can be, what are the chances of a third party to be counted in the American political system, and can we learn this from the available data?

Facet the number of contributions by all candidates (up to $250)


Looking at the different histograms, Hillary Clinton seems to lead with number of contributions, followed by Sanders and Trump. It is not really clear from this plot who are the next ones in decending order. It seems that it can be Rubio, Cruz, Bush or Carson. I will dive into who really received the highest number of contributions and who received the highest amount of contributions.

Compare the number of contributions per candidate with a bar plot


The bar plot says it all. Clinton lead these elsections with the number of contributions, followed by Sanders, Trump, Cruz, Carson, Rubio, Paul, Fiorina, Bush and Kasich, in this order. So, how many contributions exactly each of the top 10 candidates received?

Number of contributions and percent per candidate

## # A tibble: 25 x 3
##    candidate contributions percent
##    <chr>             <int>   <dbl>
##  1 Clinton         3486114   47.8 
##  2 Sanders         2021278   27.7 
##  3 Trump            742733   10.2 
##  4 Cruz             539457    7.39
##  5 Carson           244027    3.34
##  6 Rubio             98528    1.35
##  7 Paul              31170    0.43
##  8 Bush              27446    0.38
##  9 Fiorina           27410    0.38
## 10 Kasich            25166    0.34
## 11 Johnson           13184    0.18
## 12 Stein              8926    0.12
## 13 Walker             6519    0.09
## 14 Huckabee           6387    0.09
## 15 Christie           5782    0.08
## 16 O'Malley           5036    0.07
## 17 Graham             3712    0.05
## 18 Santorum           1676    0.02
## 19 Lessig             1326    0.02
## 20 McMullin           1275    0.02
## 21 Perry               896    0.01
## 22 Webb                782    0.01
## 23 Jindal              764    0.01
## 24 Pataki              323    0   
## 25 Gilmore              76    0

Hillary Clinton received 48% of the total contributions, followed by Bernie Sanders with 27% and then Donald Trump with only 10% of the total contributions in both the primaries and the general elections. Hillary Clinton received 4.5 times more contributions than Donald Trump, yet it did not help her to win the race.

Occupation

Which occupation gave most donations?

## # A tibble: 13 x 4
##    occupation             number        sum percent
##    <chr>                   <int>      <dbl>   <dbl>
##  1 RETIRED               1630191 162078962.    22.3
##  2 NOT EMPLOYED           618935  31058668.     8.5
##  3 INFORMATION REQUESTED  238300  37703600.     3.3
##  4 ATTORNEY               198752  51511381.     2.7
##  5 TEACHER                140600   7892231.     1.9
##  6 PHYSICIAN              111102  19181435.     1.5
##  7 HOMEMAKER              107877  29961875.     1.5
##  8 PROFESSOR              101698  10090862.     1.4
##  9 CONSULTANT              85940  16439027.     1.2
## 10 ENGINEER                75766   8257328.     1  
## 11 SALES                   62491   5806454.     0.9
## 12 LAWYER                  56140  14769849.     0.8
## 13 MANAGER                 54363   6839639.     0.7


This chart above cannot tells us much since there are about 120,000 occupations that donors added to their contribution forms. The text in the field was open to insert any characters without restriction, thus many occupations were writen many times in different variations
In order to analyze this facet of the dataset, we will have to write an algorithm that searches for similar terms and combine them together.
Nevertheless, in the above chart the percent of retired donors is pretty impressive, compared to them being 14.5% of the population in 2016.
Also interesting to see here is the high percentage of donors who filled ‘unemployed’ at that time. I would think unemployed people won’t have the money to donate, but they did, in their ten thousands.

Contributions by occupation (above 50,000 contributions)



Gender

Who contributed more in those elections, men or women?


Women had a slight lead with the number of contributions.

Male and female contributions in numbers

## # A tibble: 2 x 2
##   gender contributions
##   <chr>          <int>
## 1 female       3678175
## 2 male         3621818
Women contributed 3,712,479 times and men contributed 3,661,116 time. Interesting to note here that women also voted more than men in those elections. not only contributed more. By the Center for American Women and Politics, since 1964, women voted more than men in every election.
Source: Center for American Women and Politics

Source: Center for American Women and Politics


Why did women vote or contributed more than men? Maybe it is related to the fact that there were 51% women and 49% men in the US in 2016? That is a very interesting question to study in further research about women involvement in political issues, which, unfortunately, is out of the scope of this project.

Two variable exploration

Sum of contributions and percent per candidate

## # A tibble: 25 x 4
##    candidate contributions        sum percent
##    <chr>             <int>      <dbl>   <dbl>
##  1 Clinton         3486114 480974942.   51.8 
##  2 Trump            742733 121000442.   13.0 
##  3 Sanders         2021278  92929614.   10.0 
##  4 Cruz             539457  69170922.    7.45
##  5 Rubio             98528  39775178.    4.28
##  6 Bush              27446  32961134.    3.55
##  7 Carson           244027  28633656.    3.08
##  8 Kasich            25166  14656219.    1.58
##  9 Christie           5782   8033299.    0.87
## 10 Fiorina           27410   6680714.    0.72
## # ... with 15 more rows

As we can see, only 8 candidate out of the 25 had more than 1% of the sum of all contributions. Hillary Clinton received 52% of the contributions, followed by Donald Trump with 13% and Bernie Sanders with 11%.

Contributors

Top 10 donors who contributed the highest total amounts

## # A tibble: 1,299,884 x 4
##    contributor                       count  average       sum
##    <chr>                             <int>    <dbl>     <dbl>
##  1 HILLARY VICTORY FUND - UNITEMIZED    14 3090797. 43271164 
##  2 SMITH, MICHAEL                      544     177.    96286.
##  3 MILLER, MICHAEL                     506     174.    88166.
##  4 BOCH, ERNIE                           1   86937.    86937.
##  5 SMITH, JAMES                        452     175.    79053.
##  6 SMITH, WILLIAM                      598     123.    73695.
##  7 SMITH, DAVID                        686     102.    69864.
##  8 BROWN, MICHAEL                      362     188.    67997.
##  9 WILLIAMS, DAVID                     376     178.    67053.
## 10 SMITH, ROBERT                       542     121.    65797.
## # ... with 1,299,874 more rows


In 2016 elections rich donors could contribute as much as $360,000. With Hillary Clinton’s campaign. That’s how it worked: Donors who were rich - and willing - could give $5,400 to the Clinton campaign, $33,400 to the Democratic National Committee and $10,000 to each of the state parties (32 with Democratic committees), about $350,000 in all. A joint fundraising committee gave the donor do it all with a single check.
On Jan. 1, the contribution limits reset for the party committees, and the Hillary Victory Fund could go back to its donors for another $350,000 in party funds.
While the maximum donation to a presidential campaign was $2,700 for the primary elections (plus another $2,700 for the general), the Hillary Victory Fund could accept much larger contributions because it was a so-called joint fundraising committee comprised of multiple committees.
So, the Hillary Victory Fund was a fake contributor, and an extreme outlier, in our data. The lack of information about the real contributors must have some kind of influence on one or more analysis of the variables looked at in this project. The HVF funneled big amounts of money for Hillary Clinton’s campaign, using the states’ committees as a legal stamp to send money way and back to reach the maximum amount per donor, leaving only 1% of the contributions to the state’s committees. As a result, we do not know from the data we have, which is the government’s official 2016 contributions database, who gave and how much they gave to Clinton, from her biggest donors. Democratic donors, knowing the funds would end up with Clinton’s campaign, wrote six-figure checks to influence the election - 100 times larger than allowed. (from investor.com)
The actual big contributors, that were masked by the HVF, like Google, Facebook, JPMorgan Chase & Co, Stanford University, US Dept of State and others, can be found here.

Here are the rules for those years


As we can see above, on the Fec’s chart, there were loopholes in the system that allowed transactions of unlimited monies between the state and the federal committees.

Number of candidates per contributor (more than 1 candidate)


35,209 people contributed to more than 1 candidate, out of 1,307,046 recorded unique contributors, which is 2.7%. We can see that as the number of candidates goes up, the number of donors goes down, which seems logical. Who were the donor who contributed to maximum number of candidates?

Contributors who donated to maximum candidates

## # A tibble: 6 x 4
## # Groups:   contributor [6]
##   contributor        city           candidates    sum
##   <chr>              <chr>               <int>  <dbl>
## 1 WILSON, KIRK       DALLAS                  9 11730.
## 2 CALABRESI, STEVEN  PROVIDENCE              8 24300 
## 3 DRUMMOND, SARA     MONTALBA                8  6700 
## 4 AGRON, DOMINICK    DINGMANS FERRY          7  4154.
## 5 FRIESS, FOSTER MR. JACKSON                 7 18900 
## 6 BRYANT, GORDON     BEAUFORT                6  2025


Wilson Kirk, from Dallas, Texas (there were couple of Wilson Kirks in this database), was the one to donate to maximum number of candidates, 9 in number. Let’s see some more information about him and his contributions with a plot.

# 1 contributor multi-candidate supporter


Wilson Kirk, in 2015, contributed first to Fiorina and Huckabee and ended with Bush and Christie, while giving Bush 3 times. He then halted his contributions until the end of November, when he gave Trump twice. I wonder, as an obvious Republican supporter, why didn’t he give to Trump throughout 2016?

Big and small donors

I will look now into Hillary Clinton’s well-known claim that her campaign relied on small donations (less than $100). I went ahead, doubled the number and cut the data on the $200 mark (as other sources suggested), as the point that separates big and small donors.

How many Hillary’s donors were small and how many big, then?

## 
## above $200 below $200 
##     325147     135582


As we can see above, Clinton had almost 2.5 times more contributions above $200 and not as she claimed. I wonder what is the ratio for Trump and Sanders, who were her two main opponents in the two elections.

Trump’s ratio between big and small donors

## 
## above $200 below $200 
##     110134     381298


Trump had almost 3.5 times more small donors than big donors!

Sander’s ratio between big and small donors

## 
## above $200 below $200 
##     117120     102076


Sanders had almost the same number of small and big contributors. He had 1.1 more big donors than small ones.
Let’s see the distribution of contributions above and below $200 for all candidates in a graph.

Big and small contributors per candidates(split on $200)


It seems that every candidate received more money from ‘big donors’ than small ones in 2016’s elections, except Donald Trump. Trump by far passed the rest of the candidates with small donors contributions. Hillary, on the other hand, was the biggest consumer of big donations, while Sanders, Cruz and Carson receive more balanced ratio of contributinos from small and big donors.
Working on the above data, I noticed that some people contributed more than once. Let’s see who they were.

Repeating contributors

## # A tibble: 6 x 6
## # Groups:   contributor [6]
##   contributor         candidate count average   sum split_200 
##   <chr>               <chr>     <int>   <dbl> <dbl> <chr>     
## 1 TRUITT, ROBERTA     Clinton    1520       1 1520  above $200
## 2 BODNICK, KATIE      Clinton    1313       4 5465. above $200
## 3 AMISIAL, WILFRID    Clinton    1078       3 3526. above $200
## 4 PURCELL, LARRY      Sanders     705       4 3138. above $200
## 5 SAUNDERS, ELIZABETH Clinton     675       6 4324. above $200
## 6 SCHWARTZ, HILARY    Clinton     622       7 4429. above $200

Wow! Some people contributed hundreds of times. Truitt Roberta, as the leader on this plot, donated 1,520 times with an average of $1, and she gave to the Clinton campaign. There can be many reasons for that. It can be an automated system that does the online contributions for a person or an army of trolls who pump-up the number of contributions for their candidate. An interesting question here for me is who was the candidate that had the highest number of repeating contributors? I will consider here that extreme-repeating contributors as ones who donated more than 100 times.

Candidates and repeating contributors

## # A tibble: 8 x 3
##   candidate sum_count average
##   <chr>         <int>   <dbl>
## 1 Clinton      199062    149.
## 2 Sanders       76739    137.
## 3 Cruz           8453    143.
## 4 Trump           392    131.
## 5 Johnson         243    122.
## 6 Rubio           217    108.
## 7 Fiorina         107    107 
## 8 Carson          104    104


Hilary Clinton was ahead of everyone else with more than 200K of ‘extreme contributions’, followed by Sanders with 75K. The number at the top of the bars is the average number of repeating contributors per extreme donor.

Two variable explorations

Date

Amount donated at the years and months leading to the elections


Red and blue lines, respectively, are the Republican and Democratic primaries and the green line is the general election.

People started to donate already in 2014, but in very small numbers, as can be seen further down. Most of the donors started contributing in early 2015 and until November 2016. Some kept on giving even after the elections, but it died after January 2017. We can see a steady built-up of the amount donated leading to the highest amounts given in the months and days before the general election. There was a pick of contributions between February and June of 2016 and a drop right after. This might be related to the Republican and Democratic primaries that took place between January 1st, 2016 and Jan 15th, 2016.

Party/Gender

Number of contributions parties received by gender (Democrats and Republicans)


Number of contributions parties received by gender (Democrats and Republicans)


Women contributed 1.2 times more than men for the Democratic party. At the other side of the aisle, the Republican men contributed 1.8 times more than women. The Green party had even wider gap between men and women’s number of contributions. Men contributed twice as much as women to that party. The Independent party was the only one to have almost identical number of contributions from men and women. We can also see here that Democrats received the highest number of contributions. Did they also received the highest amount of contributions?

Sum of contributions parties received

## # A tibble: 4 x 2
##   party       sum_contrib
##   <chr>             <dbl>
## 1 Democrat     578843055.
## 2 Republican   348029605.
## 3 Green          1119156.
## 4 Independent     343608.


Democrats received $582M, almost twice as much as the Republicans. The green party received $1M, which is more than 3 times contributors than the Independent party.

Did women contribute more times to Hillary Clinton than men?


Women donated to Clinton 1.5 times more than men, and man donated to Trump 1.7 time more than women. It seems that the gender’s role with contributions to those two candidates was pretty dominant.

Multi-variable exploration

Election type

Facet contributions and dates by election type (General and Primaries)


Now, looking at the distribution of the donations, the voting pattern looks clearer. Donations were mostly given prior to an election. The assumption that the contributions peak we saw in the previous plot between February and June 2016 is related to the primaries, was correct.

Number of contributions by gender and party in numbers and in a plot

## # A tibble: 8 x 3
## # Groups:   party [4]
##   party       gender num_contrib
##   <chr>       <chr>        <int>
## 1 Democrat    female     3014728
## 2 Democrat    male       2499808
## 3 Republican  male       1115394
## 4 Republican  female      659862
## 5 Green       male          5966
## 6 Green       female        2960
## 7 Independent male           650
## 8 Independent female         625

Contributions by party overtime


Looking at the above faceted data, the trend we saw earlier with growing contributions over time and closer to the general elections, is missing from the Republican party. There actually seems to be trend down towards the General elections, on the Republican side.

Early contributions (2013-2014)


3 out of the 4 candidates who received early donations were Republicans; Cruz, Paul and Rubio. Rubio was the only one who received contributions in 2013 and most 2014. Did starting early helped Rubio? Let’s see how much money each candidate collected along the way to the elections.

Candidate/date

Accumulative sum of contributions candidates received

(active chart)


We can clearly see above that throughout the elections cycles Clinton had a very strong financial lead over her running mates. Interesting to see in the above graph the end of each line, which looks like it represents the drop-off from the race. Trump received contributions for about two months after the General elections. Why would he need more campaign contributions after he already won the race? Interestingly enough, after some online research I found that all of the candidates kept on receiving donations even ater they suspended their campaigns. So, why would any of the candidates who lost should keep on receiving donations?

State/City

A map with the sum of contributions by state and by city



Leading the state contributions are California with $160M, New York with $130M, Texas with $85M and Florida with $62M. As far as cities, Palm Beach pops up first with the size of the red dot and the dark color. Did the amount contributed from each state reflect the size of its population? Let’s take a look at it with 2 charts. In further investigation here I would add few variables, like gender and party, and try to find hints for relationships between all variables.

Is there a correlation between the state’s population and number of contributions?


We can see that there is a very strong correlation (0.935) between the number of contributions per state and the number of citizens in this state. In a further investigation I would analyze the correlation between cities and their financial contributions to the different parties.

Reflection

Exploring 2016 elections’ finances dataset and describing my findings in numbers, plots and maps was a great challenge! This was a great opportunity to practice plotting with the build-in R and useful external packages, like plyr, dplyr, ggplot2, data.table and others.
The work on this dataset also taught me a lot of things I was not aware of, despite them being publicaly available and me being an avid follower of politics.

Findings

Data can be misleading if it is not connected to the real-life events that produced the topic being investigated. For example, the Clinton campaign had many contributions from many contributors coming in, represented by only one name (Hillary Victory Foundation). The ‘contributor’ HVF was an outlier that skewed the data by masking all those contributors. Nevertheless, to remove this ‘contributor’ from the dataset meant to remove also the sums of the donations that this line in the dataset encompasses, which in this case was $43M, not a negligible amount, so I left it as it is and hacked my way around it.

Trivial findings
  • Hillary Clinton lead all the way with the amount and number of contributions, which did not ‘buy’ her the White House.
  • Republican candidates outnumbered the rest of the parties.
  • The American political system is made out of 2 main parties and leaves small parties ‘out of the game’.
  • $25 was the most frequent contribution.
  • Hillary Clinton relied on ‘big donations’ than small donations.
  • Men and women contributed more or less the same in those elections in general.
  • There were hardly any contributions during the weekends compared to the weekdays and at the end of each month, there was a spike with contributions.
  • Contributions started to come in as early as 2013.
  • After Trump and Hillary’s announcements more contributions came in.
  • Most of the general election’s contributions were given during the 4 months before election day, starting after the primaries ended and the final 2 candidates were chosen.
  • There was a very strong correlation between number of contributions from a state and the number of people living in this state.
Surprising findings
  • Bernie Sanders had more contributions for the primaries than the presidential winner had throughout the entire election season.
  • Hillary Clinton who received 48% of the contributions and 51% of the sum of all the contributions, while her opponent received only 10% of the contributions and 13% of the sum of all the contributions.
  • Hillary Clinton’s campaign allegedly received $84M illegal campaign contributions from rich donors, who would not have been able to donate as much as $360K, if not this special ‘arrangement’ by the HVF and the DNC.
  • 35K people donate to 2 or more than one candidate.
  • Donald Trump had 3.5 times more small donations than big ones and was the only candidate who received more small donations than big ones. Bernie Sanders received more or less the same number of big and small contributions. I was surprised by this statistic since I was always sure that he lead the small donations’ realm with his grassroots movement.
  • Some people gave hundreds of times amounts and 3 even gave more than a thousand times. Hillary Clinton lead this list with 201K repeating contributions, where Bernie Sanders followed by 78K.
  • Retired and unemployed people contributed more than everyone. Note that this facet of the analysis is not complete since the data in this column is far from being clean.
  • Contributions were given after the elections for a short while.
  • Gender played a major role with contributions to 3 out of the 4 parties, especially in the Republican and Green parties, where there are much more men contributions than women. As far as the two final candidates, Hillary Clinton had 1.5 times more contributions from women and Donald Trump had 1.7 times more contributions from men.

challenges

I started the project with a dataset of California financial contributions, the state I live in, but found it lacking data that is available on the national level, knowing that I could always go back and drill down into the state’s data. It seemed like more of a challenge to work with the national dataset and it indeed was exactly that.

Choosing to work with more than 7 million rows on a laptop was at the beginning very time consuming, but I found better ways and tools to work with for a given task to overcome the resource obstacle. For example, I experienced issues with dplyr and knittr, so I moved to work with sqldf, which worked. Still slow, but worked. I worked with built-in r functions, as with the “Percent of contributions per gender” block above, but I ended up doing most of the code with dplyr, for its straight-forward orientation. In order to improve the workflow, I also created a sample file, which I used to run time-consuming code chunks.

Naturally, the challenging part and the part that took the longest time to accomplish was the data wrangling. Here is some of the ‘heavy lifting’ I did with the ‘all-munge.R’ file:
I Changed a few of the variable names; shortened the candidates names to have only their last name; restricted the data to only the primaries and general elections of 2016; removed all the donated amounts that had minus (-); added a column to represent the candidate party affiliation; Added a column with the gender of the contributor based on a pre-defined database that I downloaded to my computer; added a new column with the day and the year, extracted from the contributions’ date column; which, all in all, ended up as a better-orgnized dataset to work with when doing the EDA.

Removing and adding new variables - I found through this project that, on one hand, you want to minimize the length of columns for the sake of speed, and on the other hand, you find that those same variables can be meaningful further down the analysis. I had to go back and recreate the program, adding the old variables back to the dataset.

Another challenge was to choose which libraries to work with on the maps. In order to be able to map the distributions of variables on a map, I had to learn some new packages, like leaflet, tmap and ggmap.

Future exploration

A very interesting topic to expore in further analysis would be donations VS votes in the 2016 elections and project it on the 2020 elections.